System Specifications

..


Methodology


library(plotly)
library(stats)
library(tidyr)
library(DT)

Results

Input

Returns: Dataframe of the dataset

setwd("~/Desktop/HS631_Bio_Informatics/Repo/prediction_of_bodyfat-team-2/data")
bodyfat<- read.csv2("bodyfat_1.csv", header=TRUE, sep=",") 
head(as.data.frame(bodyfat),n = 10)
Density BodyFat Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist
1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59 37.3 21.9 32 27.4 17.1
1.0853 6.1 22 173.25 72.25 38.5 93.6 83 98.7 58.7 37.3 23.4 30.5 28.9 18.2
1.0414 25.3 22 154 66.25 34 95.8 87.9 99.2 59.6 38.9 24 28.8 25.2 16.6
1.0751 10.4 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2
1.034 28.7 24 184.25 71.25 34.4 97.3 100 101.9 63.2 42.2 24 32.2 27.7 17.7
1.0502 20.9 24 210.25 74.75 39 104.5 94.4 107.8 66 42 25.6 35.7 30.6 18.8
1.0549 19.2 26 181 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9 27.8 17.7
1.0704 12.4 25 176 72.5 37.8 99.6 88.5 97.1 60 39.4 23.2 30.5 29 18.8
1.09 4.1 25 191 74 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9 31.1 18.2
1.0722 11.7 23 198.25 73.5 42.1 99.6 88.6 104.1 63.1 41.7 25 35.6 30 19.2
head(bodyfat, 50)
Density BodyFat Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist
1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59 37.3 21.9 32 27.4 17.1
1.0853 6.1 22 173.25 72.25 38.5 93.6 83 98.7 58.7 37.3 23.4 30.5 28.9 18.2
1.0414 25.3 22 154 66.25 34 95.8 87.9 99.2 59.6 38.9 24 28.8 25.2 16.6
1.0751 10.4 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2
1.034 28.7 24 184.25 71.25 34.4 97.3 100 101.9 63.2 42.2 24 32.2 27.7 17.7
1.0502 20.9 24 210.25 74.75 39 104.5 94.4 107.8 66 42 25.6 35.7 30.6 18.8
1.0549 19.2 26 181 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9 27.8 17.7
1.0704 12.4 25 176 72.5 37.8 99.6 88.5 97.1 60 39.4 23.2 30.5 29 18.8
1.09 4.1 25 191 74 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9 31.1 18.2
1.0722 11.7 23 198.25 73.5 42.1 99.6 88.6 104.1 63.1 41.7 25 35.6 30 19.2
1.083 7.1 26 186.25 74.5 38.5 101.5 83.6 98.2 59.7 39.7 25.2 32.8 29.4 18.5
1.0812 7.8 27 216 76 39.4 103.6 90.9 107.7 66.2 39.2 25.9 37.2 30.2 19
1.0513 20.8 32 180.5 69.5 38.4 102 91.6 103.9 63.4 38.3 21.5 32.5 28.6 17.7
1.0505 21.2 30 205.25 71.25 39.4 104.1 101.8 108.6 66 41.5 23.7 36.9 31.6 18.8
1.0484 22.1 35 187.75 69.5 40.5 101.3 96.4 100.1 69 39 23.1 36.1 30.5 18.2
1.0512 20.9 35 162.75 66 36.4 99.1 92.8 99.2 63.1 38.7 21.7 31.1 26.4 16.9
1.0333 29 34 195.75 71 38.9 101.9 96.4 105.2 64.8 40.8 23.1 36.2 30.8 17.3
1.0468 22.9 32 209.25 71 42.1 107.6 97.5 107 66.9 40 24.4 38.2 31.6 19.3
1.0622 16 28 183.75 67.75 38 106.8 89.6 102.4 64.2 38.7 22.9 37.2 30.5 18.5
1.061 16.5 33 211.75 73.5 40 106.2 100.5 109 65.8 40.6 24 37.1 30.1 18.2
1.0551 19.1 28 179 68 39.1 103.3 95.9 104.9 63.5 38 22.1 32.5 30.3 18.4
1.064 15.2 28 200.5 69.75 41.3 111.4 98.8 104.8 63.4 40.6 24.6 33 32.8 19.9
1.0631 15.6 31 140.25 68.25 33.9 86 76.4 94.6 57.4 35.3 22.2 27.9 25.9 16.7
1.0584 17.7 32 148.75 70 35.5 86.7 80 93.4 54.9 36.2 22.1 29.8 26.7 17.1
1.0668 14 28 151.25 67.75 34.5 90.2 76.3 95.8 58.4 35.5 22.9 31.1 28 17.6
1.0911 3.7 27 159.25 71.5 35.7 89.6 79.7 96.5 55 36.7 22.5 29.9 28.2 17.7
1.0811 7.9 34 131.5 67.5 36.2 88.6 74.6 85.3 51.7 34.7 21.4 28.7 27 16.5
1.0468 22.9 31 148 67.5 38.8 97.4 88.7 94.7 57.5 36 21 29.2 26.6 17
1.091 3.7 27 133.25 64.75 36.4 93.5 73.9 88.5 50.1 34.5 21.3 30.5 27.9 17.2
1.079 8.8 29 160.75 69 36.7 97.4 83.5 98.7 58.9 35.3 22.6 30.1 26.7 17.6
1.0716 11.9 32 182 73.75 38.7 100.5 88.7 99.8 57.5 38.7 33.9 32.5 27.7 18.4
1.0862 5.7 29 160.25 71.25 37.3 93.5 84.5 100.6 58.5 38.8 21.5 30.1 26.4 17.9
1.0719 11.8 27 168 71.25 38.1 93 79.1 94.5 57.3 36.2 24.5 29 30 18.8
1.0502 21.3 41 218.5 71 39.8 111.7 100.5 108.3 67.1 44.2 25.2 37.5 31.5 18.7
1.0263 32.3 41 247.25 73.5 42.1 117 115.6 116.1 71.2 43.3 26.3 37.3 31.7 19.7
1.0101 40.1 49 191.75 65 38.4 118.5 113.1 113.8 61.9 38.3 21.9 32 29.8 17
1.0438 24.2 40 202.25 70 38.5 106.5 100.9 106.2 63.5 39.9 22.6 35.1 30.6 19
1.0346 28.4 50 196.75 68.25 42.1 105.6 98.8 104.8 66 41.5 24.7 33.2 30.5 19.4
1.0202 35.2 46 363.15 72.25 51.2 136.2 148.1 147.7 87.3 49.1 29.6 45 29 21.4
1.0258 32.6 50 203 67 40.2 114.8 108.1 102.5 61.3 41.1 24.7 34.1 31 18.3
1.0217 34.5 45 262.75 68.75 43.2 128.3 126.2 125.6 72.5 39.6 26.6 36.4 32.7 21.4
1.025 32.9 44 205 29.5 36.6 106 104.3 115.5 70.6 42.5 23.7 33.6 28.7 17.4
1.0279 31.6 48 217 70 37.3 113.3 111.2 114.1 67.7 40.9 25 36.7 29.8 18.4
1.0269 32 41 212 71.5 41.5 106.6 104.3 106 65 40.2 23 35.8 31.5 18.8
1.0814 7.7 39 125.25 68 31.5 85.1 76 88.2 50 34.7 21 26.1 23.1 16.1
1.067 13.9 43 164.25 73.25 35.7 96.6 81.5 97.2 58.4 38.2 23.4 29.7 27.4 18.3
1.0742 10.8 40 133.5 67.5 33.6 88.2 73.7 88.5 53.3 34.5 22.5 27.9 26.2 17.3
1.0665 5.6 39 148.5 71.25 34.6 89.8 79.5 92.7 52.7 37.5 21.9 28.8 26.8 17.9
1.0678 13.6 45 135.75 68.5 32.8 92.3 83.4 90.4 52 35.8 20.6 28.8 25.5 16.3
1.0903 4 47 127.5 66.75 34 83.4 70.4 87.2 50.6 34.4 21.9 26.8 25.8 16.8
datatable(head(bodyfat,50))

___Returns: Find structure of the dataset

getwd()
## [1] "/Users/Yutachen/Desktop/Project_presentation_template"
str(bodyfat)
## 'data.frame':    1008 obs. of  15 variables:
##  $ Density: chr  "1.0708" "1.0853" "1.0414" "1.0751" ...
##  $ BodyFat: chr  "12.3" "6.1" "25.3" "10.4" ...
##  $ Age    : int  23 22 22 26 24 24 26 25 25 23 ...
##  $ Weight : chr  "154.25" "173.25" "154" "184.75" ...
##  $ Height : chr  "67.75" "72.25" "66.25" "72.25" ...
##  $ Neck   : chr  "36.2" "38.5" "34" "37.4" ...
##  $ Chest  : chr  "93.1" "93.6" "95.8" "101.8" ...
##  $ Abdomen: chr  "85.2" "83" "87.9" "86.4" ...
##  $ Hip    : chr  "94.5" "98.7" "99.2" "101.2" ...
##  $ Thigh  : chr  "59" "58.7" "59.6" "60.1" ...
##  $ Knee   : chr  "37.3" "37.3" "38.9" "37.3" ...
##  $ Ankle  : chr  "21.9" "23.4" "24" "22.8" ...
##  $ Biceps : chr  "32" "30.5" "28.8" "32.4" ...
##  $ Forearm: chr  "27.4" "28.9" "25.2" "29.4" ...
##  $ Wrist  : chr  "17.1" "18.2" "16.6" "18.2" ...
dt_unique <- unique(bodyfat)
nrow(dt_unique)
## [1] 1008
rm(dt_unique)
bodyfat_1 <- bodyfat

Restructure the datatypes

bodyfat_1$Density<- as.numeric(as.character(bodyfat_1$Density))
bodyfat_1$BodyFat<- as.numeric(as.character(bodyfat_1$BodyFat))
bodyfat_1$Age<- as.numeric(as.character(bodyfat_1$Age))
bodyfat_1$Weight<-as.numeric(as.character(bodyfat_1$Weight))
bodyfat_1$Height<-as.numeric(as.character (bodyfat_1$Height))
bodyfat_1$Neck<-as.numeric(as.character(bodyfat_1$Neck))
bodyfat_1$Chest<-as.numeric(as.character(bodyfat_1$Chest))
bodyfat_1$Abdomen<-as.numeric(as.character(bodyfat_1$Abdomen))
bodyfat_1$Hip<-as.numeric(as.character(bodyfat_1$Hip))
bodyfat_1$Thigh<-as.numeric(as.character(bodyfat_1$Thigh))
bodyfat_1$Knee<-as.numeric(as.character(bodyfat_1$Knee))
bodyfat_1$Ankle<-as.numeric(as.character(bodyfat_1$Ankle))
bodyfat_1$Biceps<-as.numeric(as.character(bodyfat_1$Biceps))
bodyfat_1$Forearm<-as.numeric(as.character(bodyfat_1$Forearm))
bodyfat_1$Wrist<-as.numeric(as.character(bodyfat_1$Wrist))

str(bodyfat_1)
## 'data.frame':    1008 obs. of  15 variables:
##  $ Density: num  1.07 1.09 1.04 1.08 1.03 ...
##  $ BodyFat: num  12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
##  $ Age    : num  23 22 22 26 24 24 26 25 25 23 ...
##  $ Weight : num  154 173 154 185 184 ...
##  $ Height : num  67.8 72.2 66.2 72.2 71.2 ...
##  $ Neck   : num  36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
##  $ Chest  : num  93.1 93.6 95.8 101.8 97.3 ...
##  $ Abdomen: num  85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
##  $ Hip    : num  94.5 98.7 99.2 101.2 101.9 ...
##  $ Thigh  : num  59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
##  $ Knee   : num  37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
##  $ Ankle  : num  21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
##  $ Biceps : num  32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
##  $ Forearm: num  27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
##  $ Wrist  : num  17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
head(bodyfat,50)
Density BodyFat Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist
1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59 37.3 21.9 32 27.4 17.1
1.0853 6.1 22 173.25 72.25 38.5 93.6 83 98.7 58.7 37.3 23.4 30.5 28.9 18.2
1.0414 25.3 22 154 66.25 34 95.8 87.9 99.2 59.6 38.9 24 28.8 25.2 16.6
1.0751 10.4 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2
1.034 28.7 24 184.25 71.25 34.4 97.3 100 101.9 63.2 42.2 24 32.2 27.7 17.7
1.0502 20.9 24 210.25 74.75 39 104.5 94.4 107.8 66 42 25.6 35.7 30.6 18.8
1.0549 19.2 26 181 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9 27.8 17.7
1.0704 12.4 25 176 72.5 37.8 99.6 88.5 97.1 60 39.4 23.2 30.5 29 18.8
1.09 4.1 25 191 74 38.1 100.9 82.5 99.9 62.9 38.3 23.8 35.9 31.1 18.2
1.0722 11.7 23 198.25 73.5 42.1 99.6 88.6 104.1 63.1 41.7 25 35.6 30 19.2
1.083 7.1 26 186.25 74.5 38.5 101.5 83.6 98.2 59.7 39.7 25.2 32.8 29.4 18.5
1.0812 7.8 27 216 76 39.4 103.6 90.9 107.7 66.2 39.2 25.9 37.2 30.2 19
1.0513 20.8 32 180.5 69.5 38.4 102 91.6 103.9 63.4 38.3 21.5 32.5 28.6 17.7
1.0505 21.2 30 205.25 71.25 39.4 104.1 101.8 108.6 66 41.5 23.7 36.9 31.6 18.8
1.0484 22.1 35 187.75 69.5 40.5 101.3 96.4 100.1 69 39 23.1 36.1 30.5 18.2
1.0512 20.9 35 162.75 66 36.4 99.1 92.8 99.2 63.1 38.7 21.7 31.1 26.4 16.9
1.0333 29 34 195.75 71 38.9 101.9 96.4 105.2 64.8 40.8 23.1 36.2 30.8 17.3
1.0468 22.9 32 209.25 71 42.1 107.6 97.5 107 66.9 40 24.4 38.2 31.6 19.3
1.0622 16 28 183.75 67.75 38 106.8 89.6 102.4 64.2 38.7 22.9 37.2 30.5 18.5
1.061 16.5 33 211.75 73.5 40 106.2 100.5 109 65.8 40.6 24 37.1 30.1 18.2
1.0551 19.1 28 179 68 39.1 103.3 95.9 104.9 63.5 38 22.1 32.5 30.3 18.4
1.064 15.2 28 200.5 69.75 41.3 111.4 98.8 104.8 63.4 40.6 24.6 33 32.8 19.9
1.0631 15.6 31 140.25 68.25 33.9 86 76.4 94.6 57.4 35.3 22.2 27.9 25.9 16.7
1.0584 17.7 32 148.75 70 35.5 86.7 80 93.4 54.9 36.2 22.1 29.8 26.7 17.1
1.0668 14 28 151.25 67.75 34.5 90.2 76.3 95.8 58.4 35.5 22.9 31.1 28 17.6
1.0911 3.7 27 159.25 71.5 35.7 89.6 79.7 96.5 55 36.7 22.5 29.9 28.2 17.7
1.0811 7.9 34 131.5 67.5 36.2 88.6 74.6 85.3 51.7 34.7 21.4 28.7 27 16.5
1.0468 22.9 31 148 67.5 38.8 97.4 88.7 94.7 57.5 36 21 29.2 26.6 17
1.091 3.7 27 133.25 64.75 36.4 93.5 73.9 88.5 50.1 34.5 21.3 30.5 27.9 17.2
1.079 8.8 29 160.75 69 36.7 97.4 83.5 98.7 58.9 35.3 22.6 30.1 26.7 17.6
1.0716 11.9 32 182 73.75 38.7 100.5 88.7 99.8 57.5 38.7 33.9 32.5 27.7 18.4
1.0862 5.7 29 160.25 71.25 37.3 93.5 84.5 100.6 58.5 38.8 21.5 30.1 26.4 17.9
1.0719 11.8 27 168 71.25 38.1 93 79.1 94.5 57.3 36.2 24.5 29 30 18.8
1.0502 21.3 41 218.5 71 39.8 111.7 100.5 108.3 67.1 44.2 25.2 37.5 31.5 18.7
1.0263 32.3 41 247.25 73.5 42.1 117 115.6 116.1 71.2 43.3 26.3 37.3 31.7 19.7
1.0101 40.1 49 191.75 65 38.4 118.5 113.1 113.8 61.9 38.3 21.9 32 29.8 17
1.0438 24.2 40 202.25 70 38.5 106.5 100.9 106.2 63.5 39.9 22.6 35.1 30.6 19
1.0346 28.4 50 196.75 68.25 42.1 105.6 98.8 104.8 66 41.5 24.7 33.2 30.5 19.4
1.0202 35.2 46 363.15 72.25 51.2 136.2 148.1 147.7 87.3 49.1 29.6 45 29 21.4
1.0258 32.6 50 203 67 40.2 114.8 108.1 102.5 61.3 41.1 24.7 34.1 31 18.3
1.0217 34.5 45 262.75 68.75 43.2 128.3 126.2 125.6 72.5 39.6 26.6 36.4 32.7 21.4
1.025 32.9 44 205 29.5 36.6 106 104.3 115.5 70.6 42.5 23.7 33.6 28.7 17.4
1.0279 31.6 48 217 70 37.3 113.3 111.2 114.1 67.7 40.9 25 36.7 29.8 18.4
1.0269 32 41 212 71.5 41.5 106.6 104.3 106 65 40.2 23 35.8 31.5 18.8
1.0814 7.7 39 125.25 68 31.5 85.1 76 88.2 50 34.7 21 26.1 23.1 16.1
1.067 13.9 43 164.25 73.25 35.7 96.6 81.5 97.2 58.4 38.2 23.4 29.7 27.4 18.3
1.0742 10.8 40 133.5 67.5 33.6 88.2 73.7 88.5 53.3 34.5 22.5 27.9 26.2 17.3
1.0665 5.6 39 148.5 71.25 34.6 89.8 79.5 92.7 52.7 37.5 21.9 28.8 26.8 17.9
1.0678 13.6 45 135.75 68.5 32.8 92.3 83.4 90.4 52 35.8 20.6 28.8 25.5 16.3
1.0903 4 47 127.5 66.75 34 83.4 70.4 87.2 50.6 34.4 21.9 26.8 25.8 16.8

___Returns: Check for NA`S

any(is.na(bodyfat_1))
## [1] FALSE

Exploatory data analysis

Density (create new categorical variable for Density and bodyfat)

summary(bodyfat_1$BodyFat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   14.20   20.80   20.75   26.80   50.70
bodyfat_1$fat_value_cat[bodyfat_1$BodyFat>=0 & bodyfat_1$BodyFat<=14.20]="Low body fat"
bodyfat_1$fat_value_cat[bodyfat_1$BodyFat>=14.21 & bodyfat_1$BodyFat<=26.80]="Medium body fat"
bodyfat_1$fat_value_cat[bodyfat_1$BodyFat>=26.81 & bodyfat_1$BodyFat<=50.70]="High body fat"
table(bodyfat_1$fat_value)
## 
##   High body fat    Low body fat Medium body fat 
##             250             253             505
any(is.na(bodyfat_1$fat_value_cat))
## [1] FALSE
summary(bodyfat_1$Density)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.995   1.046   1.059   1.060   1.074   1.118
bodyfat_1$Density_cat[bodyfat_1$Density>=0.995 & bodyfat_1$Density<=1.046]="Low density"
bodyfat_1$Density_cat[bodyfat_1$Density>=1.0461 & bodyfat_1$Density<=1.074]="Medium density"
bodyfat_1$Density_cat[bodyfat_1$Density>=1.0741 & bodyfat_1$Density<=1.1179]="High density"
table(bodyfat_1$Density_cat)
## 
##   High density    Low density Medium density 
##            260            245            503
any(is.na(bodyfat_1$Density_cat))
## [1] FALSE

Visualize

bodyfat_1$Density_cat<-factor(bodyfat_1$Density_cat, ordered = TRUE, levels = c("Low density","Medium density","High density"))

bodyfat_1$fat_value_cat<-factor(bodyfat_1$fat_value_cat, ordered = TRUE, levels = c("Low body fat","Medium body fat","High body fat"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Density_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Weight variable #Normally distributed #Positively skewed #Leptokurtic

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ readr   2.0.1     ✓ stringr 1.4.0
## ✓ purrr   0.3.4     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag()    masks stats::lag()
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
plot(density(bodyfat_1$Density))

hist(bodyfat_1$Density)

summary(bodyfat_1$Density)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.995   1.046   1.059   1.060   1.074   1.118
skew(bodyfat_1$Density)
## [1] -0.01920148
kurtosi(bodyfat_1$Density)
## [1] -0.3155286

Check for correlation #Negative correlated

cor.test(bodyfat_1$BodyFat, bodyfat_1$Density)
## 
##  Pearson's product-moment correlation
## 
## data:  bodyfat_1$BodyFat and bodyfat_1$Density
## t = -89.992, df = 1006, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9495837 -0.9358921
## sample estimates:
##        cor 
## -0.9431365

Age (create new categorical variable for Age)

summary(bodyfat_1$Age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.00   38.00   45.00   46.38   55.00   84.00
bodyfat_1$age_cat[bodyfat_1$Age>=22 & bodyfat_1$Age<=38]="Young"
bodyfat_1$age_cat[bodyfat_1$Age>=39 & bodyfat_1$Age<=55]="Middle age"
bodyfat_1$age_cat[bodyfat_1$Age>=56 & bodyfat_1$Age<=84]="Old"
table(bodyfat_1$age_cat)
## 
## Middle age        Old      Young 
##        506        236        266

Visualize

bodyfat_1$age_cat<-factor(bodyfat_1$age_cat, ordered = TRUE, levels = c("Young", "Middle age","Old"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=age_cat))+geom_bar(position = position_dodge(preserve = "single")) 

getwd()
## [1] "/Users/Yutachen/Desktop/Project_presentation_template"

#Univariate Distribution for Age variable #Normally distributed #Positively skewed #Leptokurtic

plot(density(bodyfat_1$Age))

hist(bodyfat_1$Age)

summary(bodyfat_1$Age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.00   38.00   45.00   46.38   55.00   84.00
skew(bodyfat_1$Age)
## [1] 0.2781083
kurtosi(bodyfat_1$Age)
## [1] -0.4303825

Check for correlation #POSITIVE correlated

cor(bodyfat_1$BodyFat, bodyfat_1$Age)
## [1] 0.299733

Weight (create new categorical variable for Weight)

summary(bodyfat_1$Weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   118.5   160.5   178.0   180.9   199.0   367.1
bodyfat_1$weight_cat[bodyfat_1$Weight>=118.5 & bodyfat_1$Weight<=160.5]="Low weight"
bodyfat_1$weight_cat[bodyfat_1$Weight>=160.5 & bodyfat_1$Weight<=199]="Midium weight"
bodyfat_1$weight_cat[bodyfat_1$Weight>=199 & bodyfat_1$Weight<=368.1]="High weight"
table(bodyfat_1$weight_cat)
## 
##   High weight    Low weight Midium weight 
##           255           251           502

Visualize

bodyfat_1$weight_cat<-factor(bodyfat_1$weight_cat, ordered = TRUE, levels = c("Low weight", "Midium weight","High weight"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=weight_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Weight variable #Normally distributed #Positively skewed #Leptokurtic

plot(density(bodyfat_1$Weight))

hist(bodyfat_1$Weight)

summary(bodyfat_1$Weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   118.5   160.5   178.0   180.9   199.0   367.1
skew(bodyfat_1$Weight)
## [1] 1.192135
kurtosi(bodyfat_1$Weight)
## [1] 5.101895

Check for correlation #POSITIVE correlated

cor(bodyfat_1$BodyFat, bodyfat_1$Weight)
## [1] 0.6122873

Height

summary(bodyfat_1$Height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.50   70.00   72.00   71.90   74.00   81.25
bodyfat_1$height_cat[bodyfat_1$Height>=29.5 & bodyfat_1$Height<=70]="short"
bodyfat_1$height_cat[bodyfat_1$Height>=70.01 &bodyfat_1$Height<=74]="Midium"
bodyfat_1$height_cat[bodyfat_1$Height>=74.01 & bodyfat_1$Height<=81.25]="tall"
table(bodyfat_1$height_cat)
## 
## Midium  short   tall 
##    484    275    249
any(is.na(bodyfat_1$height_cat))
## [1] FALSE

visualize

bodyfat_1$height_cat<-factor(bodyfat_1$height_cat, ordered = TRUE, levels = c("short", "Midium","tall"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=height_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Height variable #Normally distributed #Negative skewed #Leptokurtic

plot(density(bodyfat_1$Height))

hist(bodyfat_1$Height)

summary(bodyfat_1$Height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   29.50   70.00   72.00   71.90   74.00   81.25
skew(bodyfat_1$Height)
## [1] -4.527984
kurtosi(bodyfat_1$Height)
## [1] 46.6588

Check for correlation #Negative correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Height)
## [1] -0.03936278

Neck(create new categorical variable for Neck)

summary(bodyfat_1$Neck)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   31.10   37.90   39.80   39.74   41.50   54.70
bodyfat_1$neck_cat[bodyfat_1$Neck>=31.1 & bodyfat_1$Neck<=37.9]="Low"
bodyfat_1$neck_cat[bodyfat_1$Neck>=37.91 & bodyfat_1$Neck<=41.5]="Midium"
bodyfat_1$neck_cat[bodyfat_1$Neck>=41.51 & bodyfat_1$Neck<=54.7]="High"
table(bodyfat_1$neck_cat)
## 
##   High    Low Midium 
##    247    258    503
any(is.na(bodyfat_1$neck_cat))
## [1] FALSE

visualization

bodyfat_1$age_cat<-factor(bodyfat_1$neck_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=neck_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Neck variable #Normally distributed #Positive skewed #Leptokurtic

plot(density(bodyfat_1$Neck))

hist(bodyfat_1$Neck)

summary(bodyfat_1$Neck)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   31.10   37.90   39.80   39.74   41.50   54.70
skew(bodyfat_1$Neck)
## [1] 0.3853075
kurtosi(bodyfat_1$Neck)
## [1] 1.593368

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Neck)
## [1] 0.4905752

Chest (create new categorical variable for chest)

summary(bodyfat_1$Chest)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   79.30   96.47  101.75  102.57  107.33  139.70
bodyfat_1$chest_cat[bodyfat_1$Chest>=79.3 & bodyfat_1$Chest<=96.47]="Low"
bodyfat_1$chest_cat[bodyfat_1$Chest>=96.48 & bodyfat_1$Chest<=107.33]="Medium"
bodyfat_1$chest_cat[bodyfat_1$Chest>=107.34 & bodyfat_1$Chest<=140]="High"

any(is.na(bodyfat_1$chest_cat))
## [1] FALSE

visualization

bodyfat_1$chest_cat<-factor(bodyfat_1$chest_cat, ordered  = TRUE)
ggplot(bodyfat_1, aes(x=fat_value_cat, fill=chest_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for chest variable #Normally distributed #Positive skewed #Leptokurtic

plot(density(bodyfat_1$Chest))

hist(bodyfat_1$Chest)

summary(bodyfat_1$Chest)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   79.30   96.47  101.75  102.57  107.33  139.70
skew(bodyfat_1$Chest)
## [1] 0.6546895
kurtosi(bodyfat_1$Chest)
## [1] 0.8955072

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Chest)
## [1] 0.7071347

Abdomen (create new categorical variable for Abdomen)

summary(bodyfat_1$Abdomen)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   69.40   86.70   93.10   94.36  101.20  151.70
bodyfat_1$abdomen_cat[bodyfat_1$Abdomen>=69.40 & bodyfat_1$Abdomen<=86.70]="Low"
bodyfat_1$abdomen_cat[bodyfat_1$Abdomen>=86.71 & bodyfat_1$Abdomen<=101.2]="Midium"
bodyfat_1$abdomen_cat[bodyfat_1$Abdomen>=101.21 & bodyfat_1$Abdomen<=151.7]="High"
table(bodyfat_1$abdomen_cat)
## 
##   High    Low Midium 
##    249    253    506
any(is.na(bodyfat_1$neck_cat))
## [1] FALSE

visualization

bodyfat_1$abdomen_cat<-factor(bodyfat_1$abdomen_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=abdomen_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for abdomen variable #Normally distributed #Positive skewed #Leptokurtic

plot(density(bodyfat_1$Abdomen))

hist(bodyfat_1$Abdomen)

summary(bodyfat_1$Abdomen)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   69.40   86.70   93.10   94.36  101.20  151.70
skew(bodyfat_1$Abdomen)
## [1] 0.8148106
kurtosi(bodyfat_1$Abdomen)
## [1] 2.109904

Check for correlation #positive

cor(bodyfat_1$BodyFat, bodyfat_1$Abdomen)
## [1] 0.815291

Hip (create new categorical variable for Hip) <<<<<<< HEAD

summary(bodyfat_1$Hip)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   85.00   97.28  101.20  101.90  105.60  151.70
bodyfat_1$Hip_cat[bodyfat_1$Hip>=80 & bodyfat_1$Hip<=97.28]="Small"
bodyfat_1$Hip_cat[bodyfat_1$Hip>=97.29 & bodyfat_1$Hip<=105.60]="Midium"
bodyfat_1$Hip_cat[bodyfat_1$Hip>=105.61 & bodyfat_1$Hip<=152]="Large"
table(bodyfat_1$Hip_cat)
## 
##  Large Midium  Small 
##    251    505    252

Visualize

bodyfat_1$Hip_cat<-factor(bodyfat_1$Hip_cat, ordered = TRUE, levels = c("Small", "Midium","Large"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Hip_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Hip variable #Normally distributed #Positively skewed #Leptokurtic

library(tidyverse)
library(psych)
plot(density(bodyfat_1$Hip))

hist(bodyfat_1$Hip)

summary(bodyfat_1$Hip)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   85.00   97.28  101.20  101.90  105.60  151.70
skew(bodyfat_1$Hip)
## [1] 1.397922
kurtosi(bodyfat_1$Hip)
## [1] 6.707908

Check for correlation #POSITIVE correlated

cor(bodyfat_1$BodyFat, bodyfat_1$Hip)
## [1] 0.6310884

Thigh(create new categorical variable for Thigh)

summary(bodyfat_1$Thigh)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   47.20   57.60   60.80   61.16   64.20   90.80
bodyfat_1$Thigh_cat[bodyfat_1$Thigh>=47.2 & bodyfat_1$Thigh<=57.60]="Low"
bodyfat_1$Thigh_cat[bodyfat_1$Thigh>=57.61 & bodyfat_1$Thigh<=64.20]="Midium"
bodyfat_1$Thigh_cat[bodyfat_1$Thigh>=64.21 & bodyfat_1$Thigh<=91]="High"
table(bodyfat_1$Thigh_cat)
## 
##   High    Low Midium 
##    250    254    504
any(is.na(bodyfat_1$Thigh_cat))
## [1] FALSE

visualization

bodyfat_1$Thigh_cat<-factor(bodyfat_1$Thigh_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Thigh_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Thigh variable #Normally distributed #Positively skewed #Leptokurtic

plot(density(bodyfat_1$Thigh))

hist(bodyfat_1$Thigh)

summary(bodyfat_1$Thigh)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   47.20   57.60   60.80   61.16   64.20   90.80
skew(bodyfat_1$Thigh)
## [1] 0.7501523
kurtosi(bodyfat_1$Thigh)
## [1] 2.304358

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Thigh)
## [1] 0.5710217

Knee(create new categorical variable for Knee)

summary(bodyfat_1$Knee)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   33.00   38.30   40.00   40.09   41.70   52.10
bodyfat_1$Knee_cat[bodyfat_1$Knee>=33 & bodyfat_1$Knee<=38.30]="Low"
bodyfat_1$Knee_cat[bodyfat_1$Knee>=38.31 & bodyfat_1$Knee<=41.70]="Midium"
bodyfat_1$Knee_cat[bodyfat_1$Knee>=41.71 & bodyfat_1$Knee<=53]="High"
table(bodyfat_1$Knee_cat)
## 
##   High    Low Midium 
##    245    261    502
any(is.na(bodyfat_1$Knee_cat))
## [1] FALSE

visualization

bodyfat_1$Knee_cat<-factor(bodyfat_1$Knee_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Knee_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Knee variable #Normally distributed #Positively skewed #Leptokurtic

plot(density(bodyfat_1$Knee))

hist(bodyfat_1$Knee)

summary(bodyfat_1$Knee)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   33.00   38.30   40.00   40.09   41.70   52.10
skew(bodyfat_1$Knee)
## [1] 0.3930333
kurtosi(bodyfat_1$Knee)
## [1] 0.6793175

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Knee)
## [1] 0.5151022

Ankle(create new categorical variable for Ankle)

summary(bodyfat_1$Ankle)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    19.1    23.3    24.5    24.6    25.6    36.9
bodyfat_1$Ankle_cat[bodyfat_1$Ankle>=19 & bodyfat_1$Ankle<=23.3]="Low"
bodyfat_1$Ankle_cat[bodyfat_1$Ankle>=23.4 & bodyfat_1$Ankle<=25.6]="Midium"
bodyfat_1$Ankle_cat[bodyfat_1$Ankle>=25.7 & bodyfat_1$Ankle<=37]="High"
table(bodyfat_1$Ankle_cat)
## 
##   High    Low Midium 
##    251    257    500
any(is.na(bodyfat_1$Ankle_cat))
## [1] FALSE

visualization

bodyfat_1$Ankle_cat<-factor(bodyfat_1$Ankle_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Ankle_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Ankle variable #Normally distributed #Positively skewed #Leptokurtic

plot(density(bodyfat_1$Ankle))

hist(bodyfat_1$Ankle)

summary(bodyfat_1$Ankle)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    19.1    23.3    24.5    24.6    25.6    36.9
skew(bodyfat_1$Ankle)
## [1] 1.361183
kurtosi(bodyfat_1$Ankle)
## [1] 5.923247

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Ankle)
## [1] 0.2944401

Biceps(create new categorical variable for Biceps)

summary(bodyfat_1$Biceps)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.80   31.70   33.90   34.02   36.20   48.50
bodyfat_1$Biceps_cat[bodyfat_1$Biceps>=24.80 & bodyfat_1$Biceps<=31.70]="Low"
bodyfat_1$Biceps_cat[bodyfat_1$Biceps>=31.71 & bodyfat_1$Biceps<=36.20]="Midium"
bodyfat_1$Biceps_cat[bodyfat_1$Biceps>=36.21 & bodyfat_1$Biceps<=49]="High"
table(bodyfat_1$Biceps_cat)
## 
##   High    Low Midium 
##    251    253    504
any(is.na(bodyfat_1$Biceps_cat))
## [1] FALSE

visualization

bodyfat_1$Ankle_cat<-factor(bodyfat_1$Biceps_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Biceps_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Bicep variable #Normally distributed #Positively skewed #Leptokurtic

str(bodyfat_1)
## 'data.frame':    1008 obs. of  28 variables:
##  $ Density      : num  1.07 1.09 1.04 1.08 1.03 ...
##  $ BodyFat      : num  12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
##  $ Age          : num  23 22 22 26 24 24 26 25 25 23 ...
##  $ Weight       : num  154 173 154 185 184 ...
##  $ Height       : num  67.8 72.2 66.2 72.2 71.2 ...
##  $ Neck         : num  36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
##  $ Chest        : num  93.1 93.6 95.8 101.8 97.3 ...
##  $ Abdomen      : num  85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
##  $ Hip          : num  94.5 98.7 99.2 101.2 101.9 ...
##  $ Thigh        : num  59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
##  $ Knee         : num  37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
##  $ Ankle        : num  21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
##  $ Biceps       : num  32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
##  $ Forearm      : num  27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
##  $ Wrist        : num  17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
##  $ fat_value_cat: Ord.factor w/ 3 levels "Low body fat"<..: 1 1 2 1 3 2 2 1 1 1 ...
##  $ Density_cat  : Ord.factor w/ 3 levels "Low density"<..: 2 3 1 3 1 2 2 2 3 2 ...
##  $ age_cat      : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 2 1 1 1 2 1 1 2 3 ...
##  $ weight_cat   : Ord.factor w/ 3 levels "Low weight"<"Midium weight"<..: 1 2 1 2 2 3 2 2 2 2 ...
##  $ height_cat   : Ord.factor w/ 3 levels "short"<"Midium"<..: 1 2 1 2 2 3 1 2 2 2 ...
##  $ neck_cat     : chr  "Low" "Midium" "Low" "Low" ...
##  $ chest_cat    : Ord.factor w/ 3 levels "High"<"Low"<"Medium": 2 2 2 3 3 3 3 3 3 3 ...
##  $ abdomen_cat  : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 2 2 2 2 1 2 ...
##  $ Hip_cat      : Ord.factor w/ 3 levels "Small"<"Midium"<..: 1 2 2 2 2 3 2 1 2 2 ...
##  $ Thigh_cat    : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 2 2 2 2 3 2 2 2 2 ...
##  $ Knee_cat     : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 3 3 1 2 1 2 ...
##  $ Ankle_cat    : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 1 1 2 2 2 2 1 2 2 ...
##  $ Biceps_cat   : chr  "Midium" "Low" "Low" "Midium" ...
plot(density(bodyfat_1$Biceps))

summary(bodyfat_1$Biceps)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   24.80   31.70   33.90   34.02   36.20   48.50
skew(bodyfat_1$Biceps)
## [1] 0.2234067
kurtosi(bodyfat_1$Biceps)
## [1] 0.3088062

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Biceps)
## [1] 0.5003333

Forearm(create new categorical variable for Forearm)

summary(bodyfat_1$Forearm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   21.00   28.70   30.20   30.16   31.70   37.90
bodyfat_1$Forearm_cat[bodyfat_1$Forearm>=21 & bodyfat_1$Forearm<=28.70]="Low"
bodyfat_1$Forearm_cat[bodyfat_1$Forearm>=28.71 & bodyfat_1$Forearm<=31.70]="Midium"
bodyfat_1$Forearm_cat[bodyfat_1$Forearm>=31.71 & bodyfat_1$Forearm<=38]="High"
table(bodyfat_1$Forearm_cat)
## 
##   High    Low Midium 
##    242    255    511
any(is.na(bodyfat_1$Forearm_cat))
## [1] FALSE

visualization

bodyfat_1$Forearm_cat<-factor(bodyfat_1$Forearm_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Forearm_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Forearm variable #Normally distributed #Negatively skewed #Leptokurtic

str(bodyfat_1)
## 'data.frame':    1008 obs. of  29 variables:
##  $ Density      : num  1.07 1.09 1.04 1.08 1.03 ...
##  $ BodyFat      : num  12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
##  $ Age          : num  23 22 22 26 24 24 26 25 25 23 ...
##  $ Weight       : num  154 173 154 185 184 ...
##  $ Height       : num  67.8 72.2 66.2 72.2 71.2 ...
##  $ Neck         : num  36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
##  $ Chest        : num  93.1 93.6 95.8 101.8 97.3 ...
##  $ Abdomen      : num  85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
##  $ Hip          : num  94.5 98.7 99.2 101.2 101.9 ...
##  $ Thigh        : num  59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
##  $ Knee         : num  37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
##  $ Ankle        : num  21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
##  $ Biceps       : num  32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
##  $ Forearm      : num  27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
##  $ Wrist        : num  17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
##  $ fat_value_cat: Ord.factor w/ 3 levels "Low body fat"<..: 1 1 2 1 3 2 2 1 1 1 ...
##  $ Density_cat  : Ord.factor w/ 3 levels "Low density"<..: 2 3 1 3 1 2 2 2 3 2 ...
##  $ age_cat      : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 2 1 1 1 2 1 1 2 3 ...
##  $ weight_cat   : Ord.factor w/ 3 levels "Low weight"<"Midium weight"<..: 1 2 1 2 2 3 2 2 2 2 ...
##  $ height_cat   : Ord.factor w/ 3 levels "short"<"Midium"<..: 1 2 1 2 2 3 1 2 2 2 ...
##  $ neck_cat     : chr  "Low" "Midium" "Low" "Low" ...
##  $ chest_cat    : Ord.factor w/ 3 levels "High"<"Low"<"Medium": 2 2 2 3 3 3 3 3 3 3 ...
##  $ abdomen_cat  : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 2 2 2 2 1 2 ...
##  $ Hip_cat      : Ord.factor w/ 3 levels "Small"<"Midium"<..: 1 2 2 2 2 3 2 1 2 2 ...
##  $ Thigh_cat    : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 2 2 2 2 3 2 2 2 2 ...
##  $ Knee_cat     : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 1 2 1 3 3 1 2 1 2 ...
##  $ Ankle_cat    : Ord.factor w/ 3 levels "Low"<"Midium"<..: 2 1 1 2 2 2 2 1 2 2 ...
##  $ Biceps_cat   : chr  "Midium" "Low" "Low" "Midium" ...
##  $ Forearm_cat  : Ord.factor w/ 3 levels "Low"<"Midium"<..: 1 2 1 2 1 2 1 2 2 2 ...
plot(density(bodyfat_1$Forearm))

summary(bodyfat_1$Forearm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   21.00   28.70   30.20   30.16   31.70   37.90
skew(bodyfat_1$Forearm)
## [1] -0.1509276
kurtosi(bodyfat_1$Forearm)
## [1] 0.4527081

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Forearm)
## [1] 0.3792227

Wrist(create new categorical variable for Wrist)

summary(bodyfat_1$Wrist)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.80   18.57   19.50   19.48   20.30   23.90
bodyfat_1$Wrist_cat[bodyfat_1$Wrist>=15 & bodyfat_1$Wrist<=18.57]="Low"
bodyfat_1$Wrist_cat[bodyfat_1$Wrist>=18.58 & bodyfat_1$Wrist<=20.30]="Midium"
bodyfat_1$Wrist_cat[bodyfat_1$Wrist>=20.31 & bodyfat_1$Wrist<=24]="High"
table(bodyfat_1$Wrist_cat)
## 
##   High    Low Midium 
##    249    252    507
any(is.na(bodyfat_1$Wrist_cat))
## [1] FALSE

visualization

bodyfat_1$Wrist_cat<-factor(bodyfat_1$Wrist_cat, ordered = TRUE, levels = c("Low", "Midium","High"))

ggplot(bodyfat_1, aes(x=fat_value_cat, fill=Wrist_cat))+geom_bar(position = position_dodge(preserve = "single"))

#Univariate Distribution for Wrist variable #Normally distributed #Positively skewed #Platykurtic

hist(bodyfat_1$Wrist)

summary(bodyfat_1$Wrist)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   15.80   18.57   19.50   19.48   20.30   23.90
skew(bodyfat_1$Wrist)
## [1] 0.1037711
kurtosi(bodyfat_1$Wrist)
## [1] -0.1767461

check for correlation #positive correlation

cor(bodyfat_1$BodyFat, bodyfat_1$Wrist)
## [1] 0.3427374

Bias analysis

___Returns:Check for bias Choose a demographic variable that can identify the type of participants (Age) #Value of CI is very close to 0 = somewhat balance

table(bodyfat_1$age_cat)
## 
##    Low Midium   High 
##    258    503    247
summary(bodyfat_1$Age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.00   38.00   45.00   46.38   55.00   84.00
bodyfat_1$age_cat_2[bodyfat_1$Age>=22 & bodyfat_1$Age<=46.38]<-"young"
bodyfat_1$age_cat_2[bodyfat_1$Age>=46.39 & bodyfat_1$Age<=84]<-"old"
table(bodyfat_1$age_cat_2)
## 
##   old young 
##   463   545
na<-463
nb<-545
CI<-(na-nb)/(na+nb)
CI
## [1] -0.08134921

___Returns: Correlation matrix Convert all variavles from bodyfat dataset to numerical variables

bodyfat$Density <- as.numeric(bodyfat$Density)
bodyfat$BodyFat<- as.numeric(bodyfat$BodyFat)
bodyfat$Age<- as.numeric(bodyfat$Age)
bodyfat$Weight<-as.numeric(bodyfat$Weight)
bodyfat$Height<-as.numeric(bodyfat$Height)
bodyfat$Neck<-as.numeric(bodyfat$Neck)
bodyfat$Chest<-as.numeric(bodyfat$Chest)
bodyfat$Abdomen<-as.numeric(bodyfat$Abdomen)
bodyfat$Hip<-as.numeric(bodyfat$Hip)
bodyfat$Thigh<-as.numeric(bodyfat$Thigh)
bodyfat$Knee<-as.numeric(bodyfat$Knee)
bodyfat$Ankle<-as.numeric(bodyfat$Ankle)
bodyfat$Biceps<-as.numeric(bodyfat$Biceps)
bodyfat$Forearm<-as.numeric(bodyfat$Forearm)
bodyfat$Wrist<-as.numeric(bodyfat$Wrist)
str(bodyfat)
## 'data.frame':    1008 obs. of  15 variables:
##  $ Density: num  1.07 1.09 1.04 1.08 1.03 ...
##  $ BodyFat: num  12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
##  $ Age    : num  23 22 22 26 24 24 26 25 25 23 ...
##  $ Weight : num  154 173 154 185 184 ...
##  $ Height : num  67.8 72.2 66.2 72.2 71.2 ...
##  $ Neck   : num  36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
##  $ Chest  : num  93.1 93.6 95.8 101.8 97.3 ...
##  $ Abdomen: num  85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
##  $ Hip    : num  94.5 98.7 99.2 101.2 101.9 ...
##  $ Thigh  : num  59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
##  $ Knee   : num  37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
##  $ Ankle  : num  21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
##  $ Biceps : num  32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
##  $ Forearm: num  27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
##  $ Wrist  : num  17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...

___Returns: Plot correlation matrix

library("Hmisc")
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
## 
##     describe
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following object is masked from 'package:plotly':
## 
##     subplot
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library("corrplot")
## corrplot 0.92 loaded
bodyfat.cor<-cor(bodyfat)
corrplot(bodyfat.cor,method = "number", number.cex = 0.6)

___Returns: Linear Regression Building model_1(include all variables) P Value: < 2.2e-16, highly significant, which means at least one of the predictor variables is significantly related to the outcome variable(BodyFat).

model_1<-lm(BodyFat~Density+Age+Weight+Height+Neck+Chest+Abdomen+Hip+Thigh+Knee+Ankle+Biceps+Forearm+Wrist,  data = bodyfat)
summary(model_1)
## 
## Call:
## lm(formula = BodyFat ~ Density + Age + Weight + Height + Neck + 
##     Chest + Abdomen + Hip + Thigh + Knee + Ankle + Biceps + Forearm + 
##     Wrist, data = bodyfat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.9541 -0.8261 -0.0661  0.7261 15.3108 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.812e+02  5.996e+00  63.578  < 2e-16 ***
## Density     -3.968e+02  5.285e+00 -75.078  < 2e-16 ***
## Age         -8.628e-03  6.161e-03  -1.400 0.161722    
## Weight      -1.458e-01  6.806e-03 -21.423  < 2e-16 ***
## Height       9.880e-02  1.770e-02   5.582 3.07e-08 ***
## Neck         1.630e-01  4.242e-02   3.844 0.000129 ***
## Chest        9.513e-02  1.875e-02   5.075 4.63e-07 ***
## Abdomen      1.057e-01  2.015e-02   5.246 1.90e-07 ***
## Hip          1.925e-01  2.604e-02   7.395 3.00e-13 ***
## Thigh        2.423e-02  2.727e-02   0.889 0.374464    
## Knee         2.040e-01  4.506e-02   4.526 6.72e-06 ***
## Ankle        1.121e-01  4.122e-02   2.719 0.006654 ** 
## Biceps       4.691e-02  3.220e-02   1.457 0.145570    
## Forearm      8.500e-02  3.853e-02   2.206 0.027596 *  
## Wrist        8.928e-01  9.040e-02   9.876  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.654 on 993 degrees of freedom
## Multiple R-squared:  0.9621, Adjusted R-squared:  0.9616 
## F-statistic:  1802 on 14 and 993 DF,  p-value: < 2.2e-16

___Returns:Create prediction Equation for Regression Model (pred1)

bodyfat$bodyfat_pred_1=(-396.8*(bodyfat$Density)) + (-0.008628*(bodyfat$Age)) + (-0.1458*(bodyfat$Weight)) + (0.0988*(bodyfat$Height)) + (0.163*(bodyfat$Neck)) +(0.09513*(bodyfat$Chest)) + (0.1057*(bodyfat$Abdomen)) + (0.1925*(bodyfat$Hip)) +(0.02423*(bodyfat$Thigh)) + (0.204*(bodyfat$Knee)) + (0.1121*(bodyfat$Ankle)) +
(0.04691*(bodyfat$Biceps)) + (0.085*(bodyfat$Forearm)) + (0.8928*(bodyfat$Wrist)+381.2)
head(bodyfat)
Density BodyFat Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist bodyfat_pred_1
1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1 12.857019
1.0853 6.1 22 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2 6.984968
1.0414 25.3 22 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6 25.301044
1.0751 10.4 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2 10.860803
1.0340 28.7 24 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2 27.7 17.7 28.424775
1.0502 20.9 24 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7 30.6 18.8 22.129350

___Returns:Building model_2(with no outliers) #Check for outliers

cooksd<-cooks.distance(model_1)
#plot the cook's distance
sample_size<-nrow(bodyfat)
plot(cooksd)

abline(h=8/sample_size, col="red")
#Add cutoff line 
text(x=1:length(cooksd)+1, y=cooksd, labels=ifelse(cooksd>8/sample_size, names(cooksd),""),col="red")

___Returns:Remove outliers

top_x_outliers<-9
influential<-as.numeric(names(sort(cooksd, decreasing = TRUE)[1:top_x_outliers]))
#subset dataset without outliers
dataframe_no_outliers<-bodyfat[-influential, ]

___Returns:Build linear regression model

model_no_outlier<-lm(BodyFat~Density+Weight+Height+Neck+Chest+Abdomen+Hip+Knee+Wrist,data =dataframe_no_outliers)
summary(model_no_outlier) 
## 
## Call:
## lm(formula = BodyFat ~ Density + Weight + Height + Neck + Chest + 
##     Abdomen + Hip + Knee + Wrist, data = dataframe_no_outliers)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8949 -0.7679  0.0254  0.7601  6.2699 
## 
## Coefficients:
##               Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)  404.55891    4.71200   85.857  < 2e-16 ***
## Density     -427.16424    4.13369 -103.337  < 2e-16 ***
## Weight        -0.16284    0.00541  -30.099  < 2e-16 ***
## Height         0.28599    0.02236   12.793  < 2e-16 ***
## Neck           0.29722    0.03150    9.434  < 2e-16 ***
## Chest          0.10374    0.01441    7.200 1.19e-12 ***
## Abdomen        0.06570    0.01459    4.504 7.46e-06 ***
## Hip            0.25343    0.01795   14.120  < 2e-16 ***
## Knee           0.15182    0.03362    4.516 7.08e-06 ***
## Wrist          0.88570    0.06112   14.492  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.289 on 989 degrees of freedom
## Multiple R-squared:  0.9767, Adjusted R-squared:  0.9765 
## F-statistic:  4614 on 9 and 989 DF,  p-value: < 2.2e-16

___Returns:Create prediction Equation for Regression Model (pred2: without outliers)

bodyfat$bodyfat_pred_2=(-427.1642*(bodyfat$Density)) + (-0.16284*(bodyfat$Weight)) + (0.28599*(bodyfat$Height)) + (0.29722*(bodyfat$Neck)) +(0.10374*(bodyfat$Chest)) + (0.06570*(bodyfat$Abdomen)) + (0.25343*(bodyfat$Hip)) + (0.15182*(bodyfat$Knee)) + (0.88570*(bodyfat$Wrist)+404.55891)
head(bodyfat)
Density BodyFat Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist bodyfat_pred_1 bodyfat_pred_2
1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1 12.857019 12.181926
1.0853 6.1 22 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2 6.984968 6.810652
1.0414 25.3 22 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6 25.301044 25.147066
1.0751 10.4 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2 10.860803 10.675748
1.0340 28.7 24 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2 27.7 17.7 28.424775 28.041126
1.0502 20.9 24 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7 30.6 18.8 22.129350 22.073554

___Returns:building model_3 Check for variable importance

#prepate training scheme
library(varImp)
## Loading required package: measures
## 
## Attaching package: 'measures'
## The following object is masked from 'package:psych':
## 
##     AUC
## Loading required package: party
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## 
## Attaching package: 'strucchange'
## The following object is masked from 'package:stringr':
## 
##     boundary
library(caret)
## 
## Attaching package: 'caret'
## The following object is masked from 'package:varImp':
## 
##     varImp
## The following objects are masked from 'package:measures':
## 
##     MAE, RMSE
## The following object is masked from 'package:survival':
## 
##     cluster
## The following object is masked from 'package:purrr':
## 
##     lift
str(bodyfat)
## 'data.frame':    1008 obs. of  17 variables:
##  $ Density       : num  1.07 1.09 1.04 1.08 1.03 ...
##  $ BodyFat       : num  12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
##  $ Age           : num  23 22 22 26 24 24 26 25 25 23 ...
##  $ Weight        : num  154 173 154 185 184 ...
##  $ Height        : num  67.8 72.2 66.2 72.2 71.2 ...
##  $ Neck          : num  36.2 38.5 34 37.4 34.4 39 36.4 37.8 38.1 42.1 ...
##  $ Chest         : num  93.1 93.6 95.8 101.8 97.3 ...
##  $ Abdomen       : num  85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
##  $ Hip           : num  94.5 98.7 99.2 101.2 101.9 ...
##  $ Thigh         : num  59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
##  $ Knee          : num  37.3 37.3 38.9 37.3 42.2 42 38.3 39.4 38.3 41.7 ...
##  $ Ankle         : num  21.9 23.4 24 22.8 24 25.6 22.9 23.2 23.8 25 ...
##  $ Biceps        : num  32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
##  $ Forearm       : num  27.4 28.9 25.2 29.4 27.7 30.6 27.8 29 31.1 30 ...
##  $ Wrist         : num  17.1 18.2 16.6 18.2 17.7 18.8 17.7 18.8 18.2 19.2 ...
##  $ bodyfat_pred_1: num  12.86 6.98 25.3 10.86 28.42 ...
##  $ bodyfat_pred_2: num  12.18 6.81 25.15 10.68 28.04 ...
control<-trainControl(method="repeatedcv", number=10, repeats=3)
#train control model
lm_model<-train(BodyFat~.,data=dataframe_no_outliers, method="lm", preProcess="scale", trControl= control)
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
#estimate variables importance
importance<- varImp(lm_model, scale=FALSE)
#summarize importance
print(importance)
## lm variable importance
## 
##         Overall
## Density 105.926
## Weight   33.002
## Height   13.106
## Wrist    10.653
## Hip       9.685
## Abdomen   8.044
## Neck      7.509
## Chest     6.617
## Ankle     6.389
## Biceps    5.025
## Age       2.917
## Thigh     2.374
## Knee      1.995
## Forearm   1.175
plot(importance)

___Returns:building model_3(only with variables show importance to the outcome variable Bodyfat)

model_3<-lm(BodyFat~Density+Weight+Height,data = bodyfat)
summary(model_3)
## 
## Call:
## lm(formula = BodyFat ~ Density + Weight + Height, data = bodyfat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9136  -1.7808  -0.0367   1.7259  16.8781 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.331e+02  6.143e+00  70.507  < 2e-16 ***
## Density     -4.029e+02  5.855e+00 -68.819  < 2e-16 ***
## Weight       1.672e-02  3.993e-03   4.188 3.07e-05 ***
## Height       1.631e-01  2.504e-02   6.514 1.15e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.664 on 1004 degrees of freedom
## Multiple R-squared:  0.9006, Adjusted R-squared:  0.9003 
## F-statistic:  3033 on 3 and 1004 DF,  p-value: < 2.2e-16

___Returns:Create prediction Equation for Regression Model (pred3)

bodyfat$bodyfat_pred_3=(-424.7*(bodyfat$Density)) + (-0.03045*(bodyfat$Weight)) +  (0.1631*(bodyfat$Height)+440.4)
head(bodyfat)
Density BodyFat Age Weight Height Neck Chest Abdomen Hip Thigh Knee Ankle Biceps Forearm Wrist bodyfat_pred_1 bodyfat_pred_2 bodyfat_pred_3
1.0708 12.3 23 154.25 67.75 36.2 93.1 85.2 94.5 59.0 37.3 21.9 32.0 27.4 17.1 12.857019 12.181926 -8.0156475
1.0853 6.1 22 173.25 72.25 38.5 93.6 83.0 98.7 58.7 37.3 23.4 30.5 28.9 18.2 6.984968 6.810652 -14.0183975
1.0414 25.3 22 154.00 66.25 34.0 95.8 87.9 99.2 59.6 38.9 24.0 28.8 25.2 16.6 25.301044 25.147066 4.2334950
1.0751 10.4 26 184.75 72.25 37.4 101.8 86.4 101.2 60.1 37.3 22.8 32.4 29.4 18.2 10.860803 10.675748 -10.0366325
1.0340 28.7 24 184.25 71.25 34.4 97.3 100.0 101.9 63.2 42.2 24.0 32.2 27.7 17.7 28.424775 28.041126 7.2706625
1.0502 20.9 24 210.25 74.75 39.0 104.5 94.4 107.8 66.0 42.0 25.6 35.7 30.6 18.8 22.129350 22.073554 0.1696725

___Returns:Plot to see if the models meet linear regression assumptions, compare them and choose the best model

Plot model_1 #Residual standard error: 1.654 on 993 degrees of freedom #Multiple R-squared: 0.9621

par(mfrow = c(2, 2))
plot(model_1)

___Returns:Plot model_2 #Residual standard error: 1.289 on 989 degrees of freedom #Multiple R-squared: 0.9767

par(mfrow = c(2, 2))
plot(model_no_outlier)

___Returns:Plot model_3

#Residual standard error: 2.664 on 1004 degrees of freedom #Multiple R-squared: 0.9006, Adjusted R-squared: 0.9003

par(mfrow = c(2, 2))
plot(model_3)

Conclusion

Hence, we performed various analysis on our dataset. Analysis involved restructuring data types, creating new categorical variables, performing univariate distribution, check for correlation on all the variables, check for bias, performed linear regression by building model 1, model 2, model 3, creating prediction equation, checking for outliers, check for importance for different model and compared models for residual standard error and R-Squared.

  • Null Hypothesis: There is no relationship between the percentage of body fat for an individual and the body density.
  • After comparing model 1, model 2 and model 3 we concluded that model 2 is better to present the relationship between the outcome variable (BodyFat) and other variables. As it has smaller Residual standard error(1.937 on 1004 degrees of freedom) and higher R-squared values(0.9475).
  • The research reject the null hypothesis that we see a strong relationship, negative correlation between the outcome variable(body fat), and the predictor variable(body density)

 

For project use Project Presentation

Project